Skip to content

Envoy router verifies ate apiserver's serving cert#237

Open
Zoe Zhao (zoez7) wants to merge 5 commits into
agent-substrate:mainfrom
zoez7:mtls
Open

Envoy router verifies ate apiserver's serving cert#237
Zoe Zhao (zoez7) wants to merge 5 commits into
agent-substrate:mainfrom
zoez7:mtls

Conversation

@zoez7

@zoez7 Zoe Zhao (zoez7) commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Part of #170
Establishes mutual TLS between the atenet router and ate-apiserver, and updates the certificate plumbing it depends on.

Main changes:

  1. The atenet router now verifies ate apiserver' serving certificate, and presents its client cert to ate apiserver. Previously the connection used InsecureSkipVerify.

Minor bug fixes:

  1. Prevent servicednssigner from signing a cert with no DNS SANs. Also updated valkey cluster's cert configuration, because it was relying on the cert with empty DNS.
  • Tests pass
  • Appropriate changes to documentation are included in the PR

@ahmedtd

Copy link
Copy Markdown
Collaborator

Can you talk a little bit about the background of splitting out separate CAs / signers for the workerpools and ate-system components?

If at all possible, I would prefer to restrict us to two signers:

  • One that issues certs with DNS SANs based on the K8s services that a pod is part of (though opt-in with an annotation or label is fine).
  • One that issues SPIFFE certs that encode the k8s service account identity of the pod (and potentially, the pod ID, node ID etc in an additional extension).

The reason is that we can be reasonably confident that signers like these will land in upstream K8s in the near-ish future. The certificates here aren't special or unique to substrate, and they aren't part of the value proposition of the project. So I think we should plan to rebase on native K8s capabilities as they become available. That's easiest if we don't get very custom with the signers.

@zoez7

Copy link
Copy Markdown
Collaborator Author

I think the main motivations for splitting out separate CAs / signers was, if I mount the "podidentity" CA bundle as trusted client CAs in ate-system services, workerpool pods (therefore actors) would also be able to connect to ate-system servers like the Valkey cluster and atelet server.

But after typing it out loud - it seems this needs to be solved anyway (probably in a later milestone) by authorization inside servers such as valkey, instead of using separate CAs. Does that make sense to you?

@ahmedtd

Copy link
Copy Markdown
Collaborator

I think the main motivations for splitting out separate CAs / signers was, if I mount the "podidentity" CA bundle as trusted client CAs in ate-system services, workerpool pods (therefore actors) would also be able to connect to ate-system servers like the Valkey cluster and atelet server.

But after typing it out loud - it seems this needs to be solved anyway (probably in a later milestone) by authorization inside servers such as valkey, instead of using separate CAs. Does that make sense to you?

Yeah --- we should always draw a bright line between authentication and authorization. Assuming that a given client is authorized just because they were able to get a certificate from a particular CA has caused a lot of security problems in the past.

Unfortunately, for the specific case of Valkey we might be limited by its capabilities, in which case it might require a separate signer. Let me check the docs.

@ahmedtd

Copy link
Copy Markdown
Collaborator

It looks like valkey can do it: https://valkey.io/topics/tls/#:~:text=Client%20certificate%20authentication

@zoez7 Zoe Zhao (zoez7) force-pushed the mtls branch 3 times, most recently from 683d085 to 2e34e49 Compare June 16, 2026 01:16
Comment thread cmd/atenet/internal/router/router.go Outdated
Comment thread cmd/atenet/internal/router/router.go Outdated
Comment thread cmd/atenet/internal/router/router.go Outdated
Comment thread cmd/atenet/internal/router/router.go Outdated
Comment thread cmd/atenet/internal/router/router.go Outdated
Comment thread cmd/atenet/internal/router/router.go Outdated
Comment thread cmd/atenet/internal/router/router.go Outdated

@bowei Bowei Du (bowei) left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits

Comment thread cmd/atenet/internal/router/config.go Outdated
}, nil
}

func NewRouterCmd() *cobra.Command {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to cmd.go

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bowei Bowei Du (bowei) left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Router changes look ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants